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Abstract Energy in today’s short-range wireless com¬ 
munication is mostly spent on the analog- and digital 
hardware rather than on radiated power. Hence, purely 
information-theoretic considerations fail to achieve the 
lowest energy per information bit and the optimization 
process must carefully consider the overall transceiver. 
In this paper, we propose to perform cross-layer op¬ 
timization, based on an energy-aware rate adaptation 
scheme combined with a physical layer that is able to 
properly adjust its processing effort to the data rate 
and the channel conditions to minimize the energy 
consumption per information bit. This energy propor¬ 
tional behavior is enabled by extending the classical 
system modes with additional configuration parame¬ 
ters at the various layers. Fine grained models of the 
power consumption of the hardware are developed to 
provide awareness of the physical layer capabilities to 
the medium access control layer. The joint application 
of the proposed energy-aware rate adaptation and mod¬ 
ifications to the physical layer of an IEEE 802.lln sys¬ 
tem, improves energy-efficiency (averaged over many 
noise and channel realizations) in all considered sce¬ 
narios by up to 44%. 

Keywords energy-efficiency • MIMO communica¬ 
tion • cross-layer optimization 


Christian Senning, Georgios Karakonstantis Andreas Burg 
EPFL-STI-IEL-TCL, Station 11, ELG Oil, 

Tel.: +41 21 693 6924 
Fax: +41 21 693 2687 

E-mail: {christian.senning, georgios. karakonstantis, an- 

dreas. burg@epfl. ch } 

This paper extends the work published in 
ICASSP [32] 


Andreas Burg 


1 Introduction 

Mobile communication - anytime, anywhere access to 
data and communication services - has been continu¬ 
ously increasing since the operation of the first cellular 
phone system. This growth is combined with an increas¬ 
ing demand by consumers for small and multifunctional 
products. Such products must be able to transmit wire¬ 
lessly not only voice, but also color pictures and video, 
as well as to provide access to complex applications 
over the worldwide web [S], Satisfying such a growing 
demand for wireless communications has been achieved 
by advances in information theory, combined with the 
ability to manufacture high throughput communication 
circuits. Unfortunately, concerns are being raised that 
this scenario cannot continue forever. Specifically, as 
more and more transistors are packed onto a single 
chip to support the associated demand for high per¬ 
formance communications, more of them toggle in “ac¬ 
tive” mode or leak during “sleep” periods, resulting in 
a substantial increase of on-chip power dissipation [22| . 
Such increased power dissipation in combination with 
the slow improvements in battery capacity limit the 
battery run-time of portable devices and prevent them 
from adequately meeting user expectations. 

The classical optimization for high-performance in 
terms of throughput or error rate 13 23 performed 
by goodput-guided rate adaptation (RA) schemes may 
not be able to close the gap between supplied and re¬ 
quired energy. This problem may be attributed to the 
fact that either the best error-rate might not always be 
needed, or the highest throughput might exceed appli¬ 
cation requirements. In particular the peak data rates in 
the widely used IEEE 802.lln standard (600 Mbps) sig¬ 
nificantly surmount the requirements for high-quality 
audio transmissions (192 kbps) or high-definition video- 
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on-demand services (e.g., YouTube.com limits the max¬ 
imum rate of its content to 5 Mbps). This means that 
we have reached a point where the peak data rates of 
the wireless communication system is no longer always 
a limiting factor. This can be argued especially for sce¬ 
narios where the cumulative data rate requested by all 
users in the same wireless local area network (WLAN) 
channel is far less than the system capacity, as in iso¬ 
lated networks with small number of users (e.g., net¬ 
works within detached houses or small offices). There¬ 
fore, we show that there are many opportunities which 
can be exploited by alternative RA schemes to enhance 
the battery run-time of portable devices. 

Some works have proposed energy-efficient RA 
schemes by mainly focusing on the minimization of the 
radiated power only. However, such optimization may 
not be so beneficial from an energy-efficiency point of 
view for today’s networks for short range wireless con¬ 
nectivity, which usually spend most of the power in 
the signal processing of the receiver m- Recent studies 
shift the throughput objective of RA schemes to energy- 
efficiency optimization of the overall system mm- 
Although, such works improved the energy-efficiency of 
wireless systems, they have not yet revealed all the po¬ 
tential gains, since i) either they focus solely on the 
RA, neglecting possible tuning knobs on other layers or 
ii) their knowledge of the physical (PHY) layer is based 
on high-level information theoretical energy models, ig¬ 
noring the inter dependencies across all layers and mod¬ 
ules or iii) they investigate a given, fixed PHY layer 
treated as black-box ignoring the potential to adjust 
processing energy according to data rate or channel con¬ 
ditions (energy proportionality). 

Contributions: In this paper, we propose to en¬ 
hance the energy-efficiency by a cross-layer approach, 
based on energy-aware RA schemes combined with an 
energy proportional PHY layer design i.e., a receiver 
that is able to scale its effort (i.e., its energy consump¬ 
tion) according to the varying complexity to success¬ 
fully recover the transmitted bits under all given oper¬ 
ating conditions. Specifically, our contributions can be 
summarized as follows: 

— An energy-guided RA scheme is developed and the 
achievable gains in terms of energy spend per suc¬ 
cessfully received information bit are analyzed. Our 
results indicate that such a R A scheme can result in 
significant improvements in energy-efficiency, while 
still satisfying the data rate of most today’s appli¬ 
cations. 

— A compromise between the energy-guided RA and 
the classical goodput-guided RA is presented for ap¬ 
plications with high data rate requirements. 


— Both proposed RA schemes are enabled through a 
fine-grained energy model of the PHY layer that al¬ 
lows to capture many inter-dependencies across var¬ 
ious settings throughout the operation of a packet 
based wireless system. 

— The available choices of classical RA, assumed by 
most existing works, are extended by exploiting 
circuit level techniques combined with algorithmic 
effort scaling. This allows to truly realize an en¬ 
ergy proportional PHY layer and maximize the 
energy-efficiency gains compared to the classical 
throughput-guided RA that does not exploit any of 
the proposed PHY layer modifications. 

— A case study based on an IEEE 802.lln compliant 
PHY is shown for fixed and varying packet lengths. 

Outline: Section [2] discusses prior publications for 
energy-efficient wireless communication systems and 
corresponding energy models. Section [3] elaborates on 
the background regarding a typical frame-based com¬ 
munication system and Section |4] presents the proposed 
approach. The required energy model is developed in 
Section [5] while the applied techniques for enabling 
an energy proportional PHY layer are described in 
Section [6] The proposed optimizations are applied to a 
typical IEEE 802.lln environment and the results are 
presented in Section [7] Finally, conclusions are drawn 
in Section [8] 

2 State of the Art 

The traditional research has been focusing on optimiz¬ 
ing the transmit power based on information-theoretic 
energy-efficiency metrics mm- However, the proposed 
solutions start from a model in which the device power 
consumption is essentially given by the radiated energy 
[HI [231. Unfortunately, this assumption rarely holds in 
practice as for most battery-operated systems used for 
wireless connectivity (e.g., WLAN) the radiated power 
is only a small part of its total power consumption. 
Later studies have recognized this fact and have started 
to take active power consumed by the hardware into ac¬ 
count. While this defies the straightforward application 
of information theoretic tools and principles, it also pro¬ 
vides more opportunities for energy savings [58]. 

The hardware power has been accounted for in 
some early studies, e.g., [T]. However, the point-to-point 
streaming-system under consideration in |Tj that only 
considers a physical layer, the corresponding traffic pat¬ 
tern, and the associated tuning knobs has little in com¬ 
mon with today’s omnipresent short-range wireless net¬ 
works (e.g., IEEE 802.lln or IEEE 802.llac). Based on 
this observation recent works have been focusing on 
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providing strategies to improve energy-efficiency at the 
medium-access-control (MAC) layer [27, 36.. The lat¬ 
est studies on the component level consider different 
algorithm choices under the aspect of energy-efficiency, 
spanning both the algorithm and the architectural layer 

m- 

Later publications introduced the concept of energy- 
aware baseband processing, where scalability is put to 
service to adjust the processing to the dynamically 
changing environment, leading to even better overall 
energy-efficiency dzl UHl [23]. In this context, a recent 
work by Intel tried to provide realistic power measure¬ 
ments of the baseband for the popular IEEE 802.lln 
standard claiming to be the first ever doing it, in¬ 
dicating a shift in design objectives from throughput 
oriented communications to energy-efficiency oriented 
communication m ■ In particular, [SS] showed that the 
baseband processing of software-defined radios is most 
of the time underutilized and states that this underuti¬ 
lization can be used for power reduction (resulting in 
the desired energy proportional behavior). 

In order to exploit such underutilization, the MAC 
layer has to be aware of the energy consumption of its 
associated PHY layer. For this purpose, many models 
for the energy consumption of wireless systems have 
been proposed in the literature. Some models focus ex¬ 
clusively on the transmit power m and do not capture 
the main energy-drains, while others jointly model the 
baseband processing of the transmitter and the receiver 
EH3S]. These models then also result in optimization 
of the grid-powered access points instead of focusing 
on the more critical battery operated devices. Further 
works rely on measurements of entire chip sets or net¬ 
work interface cards mmmm that do not provide 
insight into the energy consumption of the PHY layer 
itself. 

Although existing studies have indicated the need 
for improving energy-efficiency, there is still a need for 
true cross-layer optimization, enabled by an energy- 
guided RA that is aware of the energy consumption of 
an associated energy proportional PHY layer, through 
accurate models. 

3 Background 

Fig.0 illustrates a point-to-point link of a frame-based 
wireless communication system, which is typical for 
short-range WLAN networks. The essential compo¬ 
nents for our consideration are a grid powered, central 
access point (AP) and one or more battery operated 
devices that are referred to as clients. In our scenario, 
all clients receive application data (e.g., audio or video) 
transmitted from the AP over a wireless channel. 




Fig. 1: WLAN setup with high traffic load from access 
point to battery operated mobile clients. 

Both the AP and the client, capable of transmitting 
and receiving multiple data streams using multiple an¬ 
tennas, consist of a PHY layer that is responsible for 
the actual transmission and reception of the data as 
well as a MAC layer that controls the PHY layer. The 
PHY layer is composed of various hardware modules. 
In transmit mode, these modules encode the applica¬ 
tion data, modulate them, convert them into analog 
radio frequency (RF) signals, and finally radiate them 
over the available antenna(s). On the other side of the 
point-to-point link, the PHY layer operates in receive 
mode and its modules apply the opposite functions con¬ 
verting the received RF signals into digital binary data, 
demodulating and decoding them as well as finally ex¬ 
tracting the application data. 

We assume a system that operates in time- 
division-duplex (TDD) and time-division-multiple- 
access (TDMA) mode and we consider a typical asym¬ 
metric scenario in which data is mostly downloaded 
from the AP to the client. Hence, the client is most of 
the time in receive mode. However, there are also some 
short time-slots during which each client gets into trans¬ 
mit mode for sending acknowledgment (ACK) frames to 
the AP. 

3.1 Packet Structure 

As illustrated in Fig. [2] the transmission sequence be¬ 
gins with a frame start waiting period. The duration 
of this period depends on proper sleep time prediction 
based on the power save poll MAC protocol. The actual 
frame starts with a training sequence used for frame 
start detection, initial frequency offset estimation, and 
channel estimation. The initial channel estimate is then 
used to detect the frame header containing information 
about the subsequent data payload (e.g., the payload 
length in number of bytes and the modulation and cod¬ 
ing scheme (MCS), which together determine the frame 
length in number of symbols). If the number of spatial 
streams differs between header and payload, an addi¬ 
tional training sequence is required to enable the esti- 
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Fig. 2: Simplified illustration of the IEEE 802.1 In trans¬ 
mission protocol. 



Fig. 3: Proposed approach for improved energy- 
efficiency of the mobile client. 


mation of the multiple-input multiple-output (MIMO) 
channel required later for detection of the data pay- 
load. The final part of the received frame is the data 
payload itself. Within an inter-frame spacing (IFS), the 
receiver has to decide if the data payload was correctly 
extracted and has to start transmitting an ACK frame. 
After sending the ACK frame, the PHY layer goes into 
sleep mode until he is again triggered by the MAC layer 
for the next frame reception. 

3.2 Classical Rate Adaptation 

A key feature of most communication systems for short- 
range wireless connectivity, as described above, is their 
ability to support a variety of transmission schemes to 
adjust the rate to the current channel conditions. For 
instance, the modulation and code rate can be selected 
along with other parameters such as the number of spa¬ 
tial streams N 55 , which together define the MCS, en¬ 
coded in the header field of a packet. The transmis¬ 
sion mode and the values of the associated parame¬ 
ters are selected by the MAC layer of the transmit¬ 
ter, based on channel state informatiorQ In today’s 
systems, the main objective of commercially available 
goodput-guided (GG) RA schemes is the selection of 
the most appropriate system mode v under given chan¬ 
nel conditions for the maximization of the goodput. 
The system mode is selected from a set Qqg that con¬ 
tains all possible transmission modes. Such transmis¬ 
sion modes determine the employed MCS and the frame 
length L in bits. The objective of GG RA is therefore 
given as follows 

v G G = arg max{ (1 - P e {v))$(v)}, (1) 

V 

where <P(v ) is the throughput and P e {v) is the probabil¬ 
ity of a packet error. The GG RA therefore requires to 
estimate the error rate at run-time based on the current 

1 While data reception relies on accurate up-to-date channel 
state information, we have observed that channel characteristics 
relevant for RA remain stable over a long time. 


channel conditions. For this, a variety of known tech¬ 
niques for accurately estimating P e {y) can be applied 
as discussed for example in [181 . 

The choice of v under such a GG RA actually deter¬ 
mines the operating mode of both the transmitter and 
receiver. Due to differences in the (de-)modulation and 
(de-)coding process as well as differences in the packet 
structure and duration this can also lead to differences 
in the required signal processing effort. Hence, the MCS 
selection by the RA clearly has an impact on system 
power consumption on both sides of the wireless link. 


4 Proposed Cross-Layer Approach for 
Improved Energy-Efficiency 

As we discussed, the client in modern wireless links is 
in many cases a battery operated device that operates 
most of the time in receive mode. The amount of data 
transmitted to the client is usually independent of the 
available peak data rates but given by use patterns and 
applications. Therefore, the main goal of our cross-layer 
energy-efficiency optimization is to reduce the energy 
consumption per successfully received information bit 
at the receiver to maximize the number of bits that can 
be received on a single battery charge. 

An illustration of the overall proposed approach is 
shown in Fig. [3] The medium access control (MAC) 
layer at the AP configures its PHY layer and forwards 
data to be transmitted. A packet containing this data 
is then sent over a wireless channel to the mobile client. 
The PHY layer of the client tries to recover the trans¬ 
mitted data based on the received signal. The energy 
consumption of the PHY layer of the client thereby de¬ 
pends on the transmission mode and the “difficulty” to 
recover the data under given channel conditions. To 
account for this dependency when choosing a mode, 
the energy consumption for different configurations are 
modeled in the MAC layer of the client. Based on a pre¬ 
diction of this energy model as well as on channel con¬ 
ditions reported from the PHY layer, a suitable trans¬ 
mission mode and corresponding receiver configurations 
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are suggested to the AP for future transmissions. Such 
a RA is discussed in the next section of this publication. 

4.1 Energy-Guided Rate Adaptation 

In our approach, we depart from the classical GG RA 
and propose an energy-guided (EG) RA which aims at 
reducing the energy consumption of the receiver. Specif¬ 
ically, the method selects the system mode according to 

veg = arg min {q{v)} , (2) 

V 

with v £ ft eg, where 17 eg corresponds to the set of 
all meaningful system modes that include the modes in 
Qqg as well as receiver specific settings. The function 
ri(y ) represents the energy consumption of the battery 
operated client per received information bit. Note that, 
at this point no attention is specifically paid towards 
the impact on goodput or error rate, except for the im¬ 
plicit dependency of r][y) on the data rate d>{i/) and the 
probability of a packet error P e {y)- As we will elaborate 
on, this dependency still provides a small bias toward 
the higher rate modes, but also respects the availability 
of modes associated with lower rates for the benefit of 
energy-efficiency. 

In contrast to the classical RA performed at 
the transmitter (based on channel quality feedback), 
energy-guided RA has to be performed at the battery 
operated client. This is because the AP can usually not 
accurately estimate or model the power consumption of 
a third-party battery operated device. To this end, we 
assume a setup in which MCSs are suggested from the 
client to the MAC layer of the AF0 for the next data 
frame (assuming a reasonably static channel), during 
the ACK frames. 

For occasions, when the goodput achieved by the 
EG RA does not satisfy the required data rate of 
an application, we propose a goodput-aware energy- 
guided (GAEG) RA that provides a compromise be¬ 
tween the goodput achieved by the GG RA and the 
energy-efficiency provided by the EG RA. Such a com¬ 
promise is achieved by choosing the mode vqaeg with 
the best goodput and an energy-efficiency that is upper 
bounded by the factor k > 1 times the energy per bit 
of i/eg based on 

vgaeg = arg max„ {(1 - P e (i/))$(v) \ i^v) < kri(v EG )}. (3) 

In this approach the MAC layer is able to trade energy 
consumption versus throughput at run time and there¬ 
fore allows adjustment to the application requirements 
or for systems operated close to the capacity bound, by 
selecting an appropriate value of the factor k. 

2 Important standards such as IEEE 802.lln already include 
this possibility 



Fig. 4: Simplified PHY layer block diagram with the 
processing rate of the modules highlighted. 

4.2 Augmenting System Modes 

In addition to shifting the RA to the client side, the pro¬ 
posed RA in ([2| does not only select an MCS and the 
packet length L, but also other configuration parame¬ 
ters of the client. Such an extended set PIqg of config¬ 
uration parameters may include: i) algorithmic choices, 
such as different MIMO detection algorithms, ii) num¬ 
ber of active receive chains, and iii) applied circuit level 
techniques, like clock-gating or dynamic voltage and 
frequency scaling (DVFS). Due to these extended sys¬ 
tem modes, the PHY layer can achieve a better energy 
proportional behavior, compared to the fixed PHY lay¬ 
ers that were used in prior works. With these exten¬ 
sions, the RA now has more selection choices, which 
extend the Pareto-frontier in terms of maximizing the 
achievable energy-efficiency gains. 

5 Receiver Energy Model 

To perform the mode selection at run time according 
to the MAC layer of the client needs to estimate 
the power consumption of the modules composing the 
PHY layer for each possible system mode v. To enable 
such fine grained energy awareness, we model the en¬ 
ergy consumption of the PHY layer by partitioning the 
contribution of each module based on their participa¬ 
tion in the different phases of the transmission protocol 
shown in Fig. [2j 

In order to understand the energy consumption of 
the PHY and MAC layers, a detailed energy model is 
needed. To this end, we have to understand the basic 
operating principles of the PHY layer under considera¬ 
tion. In this paper, we use an IEEE 802.lln compliant 
PHY layer implementation based on [5j as a case study. 

5.1 Receiver Architecture 

A simplified architecture of a typical WLAN RF and 
PHY layer receiver is illustrated in Fig. |4j The signal 
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from each of the Nr_y antennas is fed into a separate 
analog/RF frontend where the radio frequency signal is 
down-converted to baseband and digitized. After a par¬ 
allel to serial converter that time multiplexes the receive 
chains, a digital frontend is responsible for frame start 
detection as well as for coarse-grained frequency syn¬ 
chronization. When a frame start is detected, the sig¬ 
nals are forwarded to a time-multiplexed fast Fourier 
transform (FFT). The frequency-domain representa¬ 
tion of the training sequences of the received frame 
are fed to a channel estimation and preprocessing mod¬ 
ule. There, the channel state information is extracted 
for each packet based on the training sequence and all 
computations for MIMO signal detection that only de¬ 
pend on the channel are performed. The output of the 
channel estimation mm and preprocessing [251 [ 55 ] 
module is fed into the subsequent MIMO detector to¬ 
gether with the frequency-domain baseband samples of 
the data payload. The channel state information is fur¬ 
ther forwarded to the MAC layer as a basis for the RA 
for future data frames. After MIMO detection, the bits 
(with or without reliability information) are forwarded 
to the channel decoding module that comprises a dein¬ 
terleaver and a Viterbi decoder or an LDPC decoder. 

In our transmission scenario, shown in Fig. [2j the 
analog/RF frontend is always turned on (in receive 
or transmit mode) except during sleep periods. The 
highest work load of the subsequent digital frontend 
is during the frame start detection and the frequency- 
offset estimation in the first training phase. The FFT 
processes data during the training phases, the header 
phase, and the data payload phase of the received 
frame. During the training phases the channel estima¬ 
tion and preprocessing module is also active. The subse¬ 
quent MIMO detection and the channel decoding mod¬ 
ules process both header and data payload. 

The processing rate of the different modules is indi¬ 
cated in Fig. [4j Most modules run at the symbol rate of 
the transmission that is determined by the utilized sig¬ 
nal bandwidth of the system. The channel estimation 
and preprocessing module has to process data during 
short periods once per frame, when the channel state 
information is estimated. Otherwise the module is idle. 
The channel decoding module processes at the coded 
bit rate of the transmission that varies in IEEE 802.1 In 
compliant systems over a large range from 13 Mbps to 
720 Mbps, depending on the selected MCS. 

5.2 LUT based Energy Model 

We now consider the energy per bit of the receiver as 
our metric of main interest. To this end, we start by 
dividing the energy per received packet into three main 
contributions: 


— A constant energy overhead for synchronization, 
header processing, training, channel-rate process¬ 
ing, and transmission of an ACK frame e#(z/) in 
Joules. This part of the overall energy consumption 
depends partially on the choice of v since different 
MCSs and different antenna configurations change 
also the duration of the training and have different 
energy costs. 

— The second contribution to the overall energy con¬ 
sumption comprises the RF and the baseband pro¬ 
cessing during the data phase. This part is charac¬ 
terized by the power consumption pbb(v) in Watts 
and the duration of the data phase. The latter 
is determined by the length L and the through¬ 
put ^(v) so that the corresponding contribution 
to the energy-per-frame amounts to esBiy) = 
pbb {y) ~w(y) ■ 

— The last contribution to the total energy consump¬ 
tion of the PHY layer at the receiver comprises 
mostly the channel coding which is typically carried 
out on a bit-by-bit basis and can be described by the 
energy per bit of the channel decoding cc( v ) and 
the length of the frame L as ecc{v) = Lricc(y)- 

Combining the three energy consumption contribu¬ 
tions above and normalizing with the average number 
of successfully transmitted bits per packet, we obtain 


V(v) = 


+VccM) 

1 - Pe{v) 


( 4 ) 


as our metric of interest for optimization of the energy 
spend per information bit at the PHY layer of the client. 

One straight forward and effective implementation 
for the hardware characteristics of such an energy model 
can be realized with a look-up table (LUT), which can 
be configured off-line for any target design (such as the 
IEEE 802.lln PHY layer that we use for the case study 
in this paper). For such an energy model a detailed 
discussion of its parameters is provided in the following 
subsection. 


5.3 Energy Model Characterization 

In order to characterize the energy model based on the 
IEEE 802.lln compliant PHY layer implementations 
presented in [5_ and the analog frontend implementation 
discussed in m , we will first elaborate on the compo¬ 
nents contributing to Ch^v), ess(^), and ecc ( u )• In 
a next step, we present an automated component wise 
calibration method that is based on post-layout gate- 
level power simulations of PHY layer implementations. 
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The first term of Q, the energy-overhead euiy) is 
composed of 

e H (v)=e af {v) + e d f{v) + e fft {v) 
d” &chpp(l y ') d” &det(y') d - 

where e a f[y ), e df {y), e f f t {v), e chpp {v), and e det iy) cor¬ 
respond to the energy used by the analog frontend, the 
digital frontend, the FFT, the channel estimation and 
the matrix preprocessing circuit, as well as of the de¬ 
tector, during processing of the overhead illustrated in 
Fig. [2] The final component e ac k corresponds to the en¬ 
ergy consumed by the PHY and RF to transmit an ACK 
frame to the AP. 

To calibrate eniy), we measured the average power 
consumption of the digital frontend, the FFT, the pre¬ 
processing module, the detector, and the RF. The av¬ 
erage power consumption is then multiplied with the 
duration of the active time during the overhead pro¬ 
cessing of the corresponding components. While the ac¬ 
tive time during the header processing of the FFT, the 
preprocessing, and the detector loosely depend on the 
mode v , the active time of sending an ACK frame is 
fixed. For a given MCS, we can assume, that the active 
time of each component contributing to the overhead 
processing energy is approximately constant. 

The second term of Q, the baseband energy con¬ 
sumption pbb(v), is composed of 

Pbb{v ) = p af (y) + p df (v) +Pfft(v)+Pdet{v), ( 6 ) 

where p af (v), p d f{v), Pfft(v), and p det {v) correspond to 
the power consumption of the RF, the digital frontend, 
the FFT, and the detector, respectively. 

The last term in Q, the energy-efficiency pcc{v) is 
only composed of the channel coding module. To cali¬ 
brate the LUT for that module, we measure the energy 
consumption per bit of the channel decoding module. 

The characterization flow for the energy model is 
illustrated in Fig. [5j We first generate stimuli and then 
we monitor through simulations of the pre-synthesis 
hardware-description-language model the active mod¬ 
ules for each system mode v in order to accurately cap¬ 
ture activity pattern of the actual hardware. The start 
and end time stamps of all active periods for each se¬ 
lected module are used to automatically generate the 
required scripts for post-layout power simulation. In 
parallel, the simulation of the post-layout circuit of the 
PHY layer implementation is performed. This simula¬ 
tion outputs value change dump (VCD) files required 
for accurate power estimation. In a next step, the VCD 
files and the generated scripts are used to generate a li¬ 
brary storing the power values for the selected modules 
during the periods of interest. 



Fig. 5: Characterization of all modules in the PHY 
layer ASIC using an automated power library gener¬ 
ation flow. 


5.4 Discussion on the Proposed Energy Model 

Before proceeding with the optimization of © by prop¬ 
erly choosing v for a large, but fixed L, a brief discussion 
of the implications of this model and the dependency 
between its variables (through the choice of v) provides 
some insight into trends and ideas to simplify the esti¬ 
mation of the potential for energy savings. 

As a starting point for our considerations, we note 
that in the rather generic expression in 0 > e-H(v ), 
Pbb(v), and pcciy) all depend on u. However, in prac¬ 
tice, we note at least for large L the energy spent on 
overhead processing e#(^) is often insignificant]^ com¬ 
pared to the other terms in © Furthermore, we as¬ 
sume that for a traditional PHY implementation, both 
Pbb(v) , and pec(y') only have a limited dependency on 
the mode v. Hence, choosing v to maximize <£( v ) triv¬ 
ially minimizes the nominator of Q with diminishing 
returns due to the bias terms. Unfortunately, P e (y) also 
approaches one as ^(y) increases which ultimately lim¬ 
its the achievable goodput as well as the energy-gains. 
To eliminate the rather complex relationship between v 
and P e {u), we take advantage of the presence of a fast 
and conservative RA. For each channel realization we 
divide the available modes into two groups: one that is 
able to get a packet across with very high probability 
P e (y ) ~ 0 and a second group of modes that will al¬ 
most surely fail (P e {y) a: 1). Under these assumptions, 
the choice of the highest-rate mode that is still reliable 
is clearly the most favorite strategy and little potential 
would exist for further energy-efficiency optimization. 

However, if different modes are able to provide 
significantly different eu(y ), Pbb(v ), or pcety) for 

3 The payload phase of the transmission for low to medium 
throughput modes using a large L is much longer than the dura¬ 
tion of the header shown in Fig. [ 2 ] 
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the same channel realization, while still maintaining 
P e (jz) ss 0, we may be able to obtain further energy- 
efficiency gains over the straightforward choice for the 
mode that provides the best goodput. 

6 System Mode Extension for Enhanced 
Energy-Efficiency 

To enable further energy-efficiency improvements be¬ 
yond the ones associated with the natural choice of 
the highest-rate mode, the baseline PHY layer imple¬ 
mentation must first be improved. The main idea is to 
first introduce a more energy proportional behavior in 
a sense that e#(^), pbb{v), and r]cc( 1 ') scales better 
with the different processing requirements of the dif¬ 
ferent modes v. In the second step, we then introduce 
new modes which are not necessarily optimal in terms of 
their goodput, but may still provide an overall energy- 
efficiency advantage by further reducing processing re¬ 
quirements at the expense of rate or reliability (error 
rate). 

6.1 Energy Proportionality Without Impact on 
Goodput Through DVFS 

A well designed physical layer implementation already 
provides a certain degree of energy-proportionality. 
Hence, the energy consumption of such a PHY depends 
more or less linearly on the number of operations re¬ 
quired to extract the payload data from the received 
frame. A good example is the channel decoding, which 
contributes to Q a constant energy per decoded bit. 
Unfortunately, such circuits do not exploit differences 
in the time available to process each individual bit. To 
take advantage of different throughput requirements in 
different modes, we notice that in very large integrated 
circuits the maximum operating frequency of a digital 
circuit scales approximately linearly with the supply 
voltage (within reasonable range). However, the power 
consumption of the circuit scales quadratically with 
the supply voltage. Therefore, we can translate relaxed 
throughput requirements (in terms of operations per 
second) into further energy-savings per operation, an 
idea that is commonly referred to as DVFS. 

Practical applications of DVFS m are clearly as¬ 
sociated with many difficulties and with overhead (e.g., 
voltage regulators and alike). In |37| it is shown that in¬ 
tegrated voltage regulators with a conversion-efficiency 
better than 80-90% and a reasonable circuit size are 
feasible in modern CMOS technology. More specifically, 
integrated DC-DC converters providing an output cur¬ 
rent with several amps per square millimeter silicon 


80 



15 45 90 120 150 240 300 450 600 


system throughput [Mbps] 

Fig. 6: Relative energy-efficiency improvement of the 
channel coding module using DVFS for all supported 
transmission schemes. 

area have been proposed in |7] and (2). Hence volt¬ 
age regulators for circuits consuming few hundred mil¬ 
liwatt require only a small fraction of a square millime¬ 
ter silicon area. However, in order to explore the limits 
of energy-optimal data transmission, regardless of the 
technical issues on circuit level we intentionally neglect 
the associated overhead for voltage conversion in this 
paper. 

Application of DVFS to channel decoding: A first ob¬ 
vious opportunity to apply DVFS in a conventional 
IEEE 802.lln compliant receiver is the channel decod¬ 
ing module. For this module, the processing rate varies 
significantly (from 13 Mbps to 720 Mbps). For transmis¬ 
sion modes with a system data rate up-to 300 Mbps, one 
decoding core is used. On the other hand, two cores are 
used for larger data rates. With DVFS, this part of the 
receiver can take advantage of the reduced rate and 
provide better overall energy-per-bit without negative 
impact on error rate performance of the receiver. 

The relative energy savings for the channel decod¬ 
ing module applying DVFS are shown in Fig. [6] It can 
be seen, that the lower the data rate, the better the 
energy savings from DVFS as the clock frequency and 
consequently the supply voltage can be reduced. Fur¬ 
ther, it can be seen, when the number of channel de¬ 
coding cores changes from one to two, as specified in 
the IEEE 802.lln standard, then the achievable energy- 
efficiency gain is again increased, as the data rate per 
decoder core is reduced. 

6.2 Energy Proportionality With Impact on Goodput 

Two other possibilities for enhancing the energy pro¬ 
portional behavior of the PHY layer, which can affect 
the goodput are the following: 

Sub-optimal algorithms: A possible modification to im¬ 
prove energy proportional behavior is to take advantage 
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of the fact that in some situations different receiver al¬ 
gorithms with noticeable complexity difference exhibit 
a similar error-rate behavior. Hence, choosing an algo¬ 
rithm with higher computational-complexity may still 
not allow for a goodput improvement that out-weights 
the higher energy-cost. 

A good example is the choice of the MIMO detec¬ 
tor, where a solution with close to maximum likelihood 
(ML) or a posteriori probability (APP) performance 
can be combined with a low-complexity MIMO detec¬ 
tor. The ML or APP detector provides good perfor¬ 
mance required for bad conditions, (e.g., low signal to 
noise ratio (SNR) or ill-conditioned MIMO channels), 
while the low-complexity alternative uses less energy. 
For conditions where the occurrence of a frame error is 
independent of the detector (e.g., very high SNR, very 
low SNR), the low-complexity alternative achieves ob¬ 
viously better energy-efficiency. 

Exploiting active antenna selection for applying DVFS: 
A second opportunity to apply DVFS exists in case of 
reduced-complexity receiver configurations that may be 
suboptimal in terms of error rate performance but can 
be advantageous in terms of energy consumed. Since 
the corresponding reduced complexity modes will gen¬ 
erally only allow for a goodput that is lower than that 
provided by more complex modes, they can only be ad¬ 
vantageous when savings in ejy(i/), pss(^), and r]cc{ v ) 
make up for the goodput loss. The most obvious target 
for such additional modes is the choice of the number 
of active antennas Njjx at the receiver [25]. Such adap¬ 
tation allows to roughly linearly scale the power of the 
RF frontend with N^jf) but also limits the number of 
spatial streams to Nsst<Nrx- The baseband process¬ 
ing can take further advantage of this scaling due to the 
reduced number of operations (e.g., number of multi¬ 
plications and additions) but also benefits from DVFS 
since fewer receive chains must be processed in a time- 
interleaved fashion and therefore more time is available 
per active receive chain. 


7 Evaluation 

In this section we explore the limits of the gains 
in energy-efficiency that can be achieved with the 
proposed modifications to the PHY layer combined 
with the proposed energy-guided RA scheme. Further¬ 
more, we compare the proposed RA schemes with the 
classical GG RA. We perform this evaluation on an 
IEEE 802.1 In compliant wireless communication sys¬ 
tem for two different scenarios described in the follow¬ 
ing paragraphs. 


7.1 Scenarios 

In our evaluation, we consider two different scenarios: In 
the first scenario, the AP transmits frames with a fixed 
length L of 1.5 kB over a Rayleigh block-fading channel. 
For this simple scenario we compare the impact of all 
three discussed RA schemes in detail. In the second 
scenario the AP transmits frames with a varying length. 
To this end, we enable the AP to perform aggregatior0 
of up to 16 frames, each with 1.5kB. 

For both scenarios, we considered the RF given in 
[117] combined with the IEEE 802.lln compliant PHY 
layer given in [5], with up to 4 spatial streams and up 
to 4 receive antennas using transmissions with 40 MHz 
bandwidth. For channel coding we use only the convo¬ 
lutional code defined in the standard. The 32 manda¬ 
tory MCSs defined in IEEE 802.lln result in a variable 
frame duration for a given L. 

For 10 * * 3 channel realizations, the reception of a frame 
for all considered system modes have been simulated. 
In particular the simulated system modes include all 
mandatory MCSs of the IEEE 802.lln standard. For 
each MCS, the number of active receive antennas varies 
between N 55 (the minimal number of receive antennas 
required for the specific MCS) and 4. In addition, all 
MCSs have been simulated with a hard-output lattice 
reduction aided linear minimum mean squared error 
(MMSE) MIMO detector (LRALD) [31], and a low- 
complexity soft-output MMSE detector, resulting in 
112 different system mode^M for each MIMO detector 
algorithm comprising ft eg- 

7.2 Ideal Estimation of Channel Conditions 

As we discussed in Section [4] any RA requires an ac¬ 
curate estimation of P e (iz) for the actual channel con¬ 
ditions. Since the purpose of this paper is to explore 
the potential and the limits of energy-awareness we in¬ 
tentionally assume a genie-aided approach for the RA. 
This simplification avoids introducing uncertainties due 
to specific RA strategies and it simplifies the full expla¬ 
nation of the setup under consideration. The genie pro¬ 
vides an upper bound on goodput and energy-efficiency 

4 Frame error rate remains independent of the number of ag¬ 
gregated frames (individual ACK in the same ACK frame). Still, 
overhead (PHY and ACK for small packets individually) can be 
avoided, which motivates aggregation of multiple frames. 

5 32 transmission modes received with 4 receive chains without 
applying DVFS; 32 transmission modes with 4 receive chains us¬ 
ing DVFS at the channel decoder; 24 transmission modes using 

3 receive chains and DVFS wherever applicable; 16 transmission 
modes using 2 receive chains and DVFS wherever applicable; 8 
transmission modes using 1 receive chain and DVFS wherever 
applicable. 
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Fig. 7: Energy consumption per successfully transmit¬ 
ted bit. 



Fig. 8: Relative power consumption with respect to GG 
RA without DVFS. 



Fig. 9: Throughput achieved by the different schemes. 

by perfectly predicting transmission failures by sending 
each packet in all modes for each channel and noise 
realization which - of course - is only feasible in sim¬ 
ulation but could be approached in practice with care¬ 
fully designed algorithms. Hence, P e (zz) is either one or 
zero which boils down to limiting the selection to the 
error-free modes according to a specific optimization 
criterior|f] 


6 Practical implementation to estimate P e (i / ) can be found in 

0 - 


7.3 Results for a Fixed Frame Length 

In Fig. [7] to Fig. [9] we compare the impact of the pro¬ 
posed modifications in conjunction with the considered 
RA strategies on absolute and relative energy per bit, 
as well as on goodput. 

We compare all obtained curves to the reference one 
corresponding to a GG RA without the PHY layer mod¬ 
ifications proposed in Section |G] This baseline imple¬ 
mentation applies no DVFS and always employs 4 re¬ 
ceive chains resulting in the worst energy consumption 
per successfully received bit as shown in Fig. [7] How¬ 
ever, such GG approach leads to the highest goodput 
over the entire SNR range as shown in Fig. [9] 

— By assuming only PHY modifications, under a GG 
RA, no significant energy-efficiency gains can be 
achieved over the baseline implementation. In par¬ 
ticular, GG RA never sacrifices error rate perfor¬ 
mance by reducing the number of active antennas 
or by using a sub-optimal MIMO detector. Hence, 
the only PHY modification that can be applied is 
DVFS in the channel coding module. Since chan¬ 
nel decoding is only a small fraction of the over¬ 
all receiver power consumption, we obtain only the 
marginal energy reduction of 5% as shown in Fig. [8] 

— By modifying the RA to EG RA, while still applying 
DVFS only in the channel decoding, without the ca¬ 
pability to scale the number of active receive chains, 
the energy per bit is reduced by up to 15% compared 
to the baseline case. Although the EG RA improves 
the energy-efficiency compared to the baseline im¬ 
plementation and the previous considered case, the 
restriction to exactly 4 active receive antennas does 
not reveal the maximum gains. Note that, the good- 
put does not deteriorate significantly compared to 
the baseline implementation. 

— The maximum average efficiency gains up to 44% 
are obtained in case of joint application of all pro¬ 
posed PHY layer modifications under an EG RA 
(DVFS applied in the channel coding module along 
with DVFS based on active receive chain selection, 
as well as selection of an appropriate MIMO de¬ 
tector for the specific channel condition). However, 
such savings come at the cost of a reduced goodput, 
as shown in Fig. [9] In any case, most applications 
such as today’s compressed HD videos on popular 
video-on-demand platforms or high quality audio 
streams require a much lower data rate than the 
one achieved by our approach. Hence, the proposed 
approach can still satisfy the data rate requirements 
of most popular applications while significantly im¬ 
proving the energy-efficiency of the battery operated 
device. 
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— For applications requiring much higher goodput and 
in case that minor energy-efficiency sacrifice is ac¬ 
ceptable, then our results show that GAEG RA can 
provide a reasonable compromise between EG RA 
and GG RA. As shown in Fig. [7] and Fig. [8] GAEG 
RA with a factor k = 1.05 or k = 1.10 reduces the 
efficiency only slightly compared to the best achiev¬ 
able gain in case of EG RA. However, in the high 
SNR region (i.e., the region where high data rates 
can be achieved) the maximum achievable good- 
put is restored to 97% compared to the goodput 
achieved by the GG RA scheme, as shown in Fig. [9] 

In order to show the efficacy of the EG RA, Tbl. |T] 
and Tbl. [2] demonstrate the percentaged distribution of 
the selected modes and the corresponding energy con¬ 
sumption per successfully received bit for the different 
SNR values. In case of 15 dB SNR, the EG RA chooses 
most of the time the linear MMSE MIMO detector, 
because the LRALD has a higher computational com¬ 
plexity and therefore is more power hungry, but rarely 
improves the reception of the frames under such chan¬ 
nel conditions. However in the high-SNR regime, where 
ill-conditioned channels have a more prominent impact, 
the choice of LRALD is still preferable for the EG RA 
in many cases, as shown in Tbl. [2] 


7.4 Results for Varying Frame Lengths 

In order to show the relative energy improvement for 
the overall battery operated device in an office environ¬ 
ment, we consider the case of aggregated frames with 
the IEEE TGnC [TO] channel model, as described in the 
IEEE 802.lln standard. To this end, we evaluate for 
varying frame sizes in Fig. [TO] to Fig. [IT] the achieved 
energy-efficiency and goodput of the proposed PHY 
layer modifications under an EG RA scheme, compared 
with the ones achieved by the baseline implementation. 
For this setup, we allow the AP to aggregate up to 16 
frames. Both, the GG RA and the EG RA are allowed to 
select the number of aggregated frames for each trans¬ 
mission freely such that the resulting total frame length 
L is between 1.5 kB and L max < 24kB, while each of 
the aggregated frames is acknowledged independently 
in a single ACK frame. 

The resulting relative energy consumption with re¬ 
spect to the SNR and the maximum frame length L max 
(which is an integer multiple of 1.5 kB) is depicted in 
Fig. [TO] It can be observed that the best relative energy 
consumption is achieved for short frames and moderate 
to high SNR reaching up to 36.6%. In this moderate to 
high SNR region the RA can take advantage of many 


Table 1: Experimental results at 15 dB SNR 



GG RA 



EG RA ( 

var. ant.) 


MGS 

[nJ/bit] 

% 

MGS alg. 

[n J/bit] 

% 

12 

4.21 

65.0 

12 

MMSE 

3.85 

0.5 

13 

3.58 

6.8 

3 

MMSE 

3.27 

14.6 

14 

3.46 

0.5 

4 

MMSE 

2.49 

22.3 

18 

5.13 

1.0 

5 

MMSE 

2.07 

4.9 

19 

4.39 

2.4 

6 

MMSE 

1.94 

1.0 

20 

3.58 

16.0 

10 

MMSE 

3.44 

1.5 

27 

3.88 

8.3 

11 

MMSE 

2.81 

31.1 




12 

MMSE 

2.23 

2.7 




18 

MMSE 

3.38 

8.2 




19 

MMSE 

2.88 

8.2 




26 

MMSE 

3.62 

1.0 




27 

MMSE 

3.09 

2.9 




14 

LRALD 

3.56 

1.0 

average: 4.05 nj/bit 

average: 2.83nJ/bit 


Table 2: Experimental results at 30 dB SNR 


GG RA 



EG RA (var. ant.) 


MCS [nJ/bit] 

% 

MCS alg. 

[n J/bit] 

% 

22 3.07 

1.0 

8 

MMSE 

1.85 

7.3 

23 2.92 

71.8 

14 

MMSE 

1.93 

1.0 

29 2.90 

0.5 

15 

MMSE 

1.89 

3.9 

31 2.80 

26.7 

16 

MMSE 

1.85 

7.3 



23 

MMSE 

1.92 

0.5 



24 

MMSE 

1.82 

48.5 



4 

LRALD 

1.85 

12.1 



11 

LRALD 

1.89 

1.9 



12 

LRALD 

1.82 

17.0 



20 

LRALD 

1.90 

0.5 

average: 2.89 nJ/bit 

average: 1.83 nJ/bit 



mum frame lengths and different channel quality. 
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Fig. 11: Throughput of the proposed optimized system. 


operation modes with a P e (v) ~ 0. It is worth men¬ 
tioning that the relative energy consumption of the pro¬ 
posed system is in all regions at least 13.3% better than 
that of the baseline implementation. This indicates that 
indeed our method can improve the energy-efficiency of 
mobile clients for realistic channel models. 

Finally, we show the impact of the EG RA on the 
achieved goodput in Fig. m While we expected a re¬ 
duced goodput for non-aggregated frames, our data 
shows that EG RA achieves for aggregated frames 
a goodput close to the goodput achievable by GG 
RA. Therefore there is no need for a GAEG RA and 
any compromise in terms of energy-efficiency can be 
avoided. 


8 Conclusion 

In this paper, we depart from conventional goodput op¬ 
timized communication systems and present a cross¬ 
layer approach that involves various design, analysis 
and run-time techniques for improving the energy- 
efficiency of packet based communication systems. The 
proposed approach considers all blocks within the PHY 
layer of battery operated devices that participate in the 
communication link and optimizes the energy-efficiency 
by jointly considering the system level and hardware 
level parameters within the MAC and PHY layer. 

The energy efficiency optimization is achieved with 
the introduction of an energy-guided rate adaptation 
scheme which is combined with several modifications 
of the PHY layer to enable maximum energy savings. 
Several new modes of operation with reduced receiver 
complexity are added to further improve the energy 
proportional behavior of the battery operated device. 
Fine grained energy models of PHY are developed for 


propagating such behavior to the MAC layer and al¬ 
lowing the energy-guided rate adaptation to select the 
optimum system mode under any given condition and 
requirement. 

The numerical results indicate that our approach 
can achieve up-to 44% energy efficiency improvement 
in an IEEE802.lln system. This is achieved by prop¬ 
erly selecting the system mode along with the right de¬ 
gree of voltage and frequency scaling and the selection 
of the appropriate algorithm for data processing when 
possible. We further show that a goodput-aware energy- 
guided rate adaptation can provide a reasonable com¬ 
promise between energy and goodput if the goodput 
achieved by the energy-guided rate adaptation is insuf¬ 
ficient. In any case, our work, encourages a paradigm 
shift towards an energy-guided RA with a properly en¬ 
ergy proportional PHY layer, thus motivating further 
work towards this direction. 
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